Enriching Statistical Translation Models Using a Domain-Independent Multilingual Lexical Knowledge Base
نویسندگان
چکیده
This paper presents a method for improving phrase-based Statistical Machine Translation systems by enriching the original translation model with information derived from a multilingual lexical knowledge base. The method proposed exploits the Multilingual Central Repository (a group of linked WordNets from different languages), as a domain-independent knowledge database, to provide translation models with new possible translations for a large set of lexical tokens. Translation probabilities for these tokens are estimated using a set of simple heuristics based on WordNet topology and local context. During decoding, these probabilities are softly integrated so they can interact with other statistical models. We have applied this type of domain-independent translation modeling to several translation tasks obtaining a moderate but significant improvement in translation quality consistently according to a number of standard automatic evaluation metrics. This improvement is especially remarkable when we move to a very different domain, such as the translation of Biblical texts.
منابع مشابه
Multilingual Lexical Representation
The approach to multilingual lexical representation developed as part of the ACQUILEX Lexical Knowledge Base (LKB) discussed with specific reference to complex translation equivalence. The treatment described provides a lexicalist account of translation mismatches in terms of translation links which capture cross-linguistic generalizations across sets of semantically related lexical items, and ...
متن کاملImproving Machine Translation through Linked Data
With the ever increasing availability of linked multilingual lexical resources, there is a renewed interest in extending Natural Language Processing (NLP) applications so that they can make use of the vast set of lexical knowledge bases available in the Semantic Web. In the case of Machine Translation, MT systems can potentially benefit from such a resource. Unknown words and ambiguous translat...
متن کاملTowards Universal Multilingual Knowledge Bases
Lexical, ontological, as well as encyclopedic knowledge is increasingly being encoded in machine-readable form. This paper deals with knowledge representation in multilingual settings. It begins by proposing a generic graph-based knowledge base framework, and then, in three case studies, explains how preexisting knowledge can be cast into this framework. The first case study involves enriching ...
متن کاملThe Habanera Lexical Knowledge Base Management System
Habanera is a multipurpose multilingual lexical knowledge base that is developed at CRL to be used as a central repository of multilingual lexical data. The knowledge base contains a set of dictionaries and relations between entries, within a dictionary (e.g., synonymy) as well as between entries of different dictionaries (e.g., translation). The format of monolingual lexical entries is left re...
متن کاملA multilingual ontology matcher
State-of-the-art multilingual ontology matchers use machine translation to reduce the problem to the monolingual case. We investigate an alternative, self-contained solution based on semantic matching where labels are parsed by multilingual natural language processing and then matched using a language-independent knowledge base acting as an interlingua. As the method relies on the availability ...
متن کامل